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The diversity of a community that cannot be fuUy counted must be inferred. The 
two preeminent inference methods are the MaxEnt method, which uses information in 
the form of constraints and Bayes' rule which uses information in the form of data. It 
has been shown that these two methods are special cases of the method of Maximum 
(relative) Entropy (ME) . We demonstrate how this method can be used as a measure of 
diversity that not only reproduces the features of Shannon's index but exceeds them by 
allowing more types of information to be included in the inference. A specific example 
is solved in detail. Additionally, the entropy that is found is the same form as the 
thermodynamic entropy. 



1 Introduction 

Diversity is a concept that is used in many fields to describe the variability of 
different entities in a group. In ecology, the Shannon entropy Ij and Simpson's 
index [5] are the predominate measures of diversity. In this paper we focus on 
the Shannon entropy for two reasons: First, it has been shown that Simpson's 
index is an approximation of Shannon's 3j. Second, Shannon's entropy is closely 
tied to many other areas of research, such as information theory and physics. 

It is often the case that the species in a community cannot be fully counted. 
In this case, when one has incomplete information, one must rely on methods of 
inference. The two preeminent inference methods are the MaxEnt [4] method, 
which has evolved to a more general method, the method of Maximum (relative) 



Entropy (ME) [SJ [51 [7] and Bayes' rule. The choice between the two methods 
has traditionally been dictated by the nature of the information being processed 
(either constraints or observed data). However, it has been shown that one can 
accommodate both types of information in one method, ME [8]. The purpose 
of this paper is to demonstrate how the ME method can be used as a measure 
of diversity that is able to include more information that Shannon's measure 
allows. 

Traditionally when confronted with a community whose count is incomplete, 
the frequency of the species that are counted are used to calculate the diversity. 
The frequency is used because it represents an estimate of the probability of 
finding a particular species in the community. However, the frequency is not 
equivalent to the probability [S] and as such is a poor estimate. Fortunately, 
there are much better methods for estimating or inferring the probability such 
as MaxEnt and Bayes. Even more fortunate is that the new ME method can 
reproduce every aspect of Bayesian and MaxEnt inference and tackle problems 
that the two methods alone could not address. 

We start by showing a general example of the ME method by inferring a 
probability with two different forms of information: expected valued and data, 
simultaneously. The solution resembles Bayes' Rule. In fact, if there are no 
moment constraints then the method produces Bayes rule exactly. If there is no 
data, then the MaxEnt solution is produced. 

Finally we solve a toy ecological problem and discuss the diversity calculated 
by using Shannon's entropy and the diversity calculated by the ME method. 
This illustrates the many advantages to using the ME method. 

2 Simultaneous updating 

Our first concern when using the ME method to update from a prior to a poste- 
rior distributiorll is to define the space in which the search for the posterior will 
be conducted. We wish to infer something about the values of one or several 
quantities, S 0, on the basis of three pieces of information: prior information 
about 9 (the prior), the known relationship between x and 6 (the model), and 
the observed values of the data x G X. Since we are concerned with both x 
and 6, the relevant space is neither X nor O but the product X x Q and our 
attention must be focused on the joint distribution P{x,6). The selected joint 
posterior Pncwi^, 9) is that which maximizes the entropy. 



^For simplicity we will refer to these expected values as moments although they can be 
considerably more general. 

^In Bayesian inference, it is assumed that one always has a prior probability based on some 
prior information. When new information is attained, the old probility (the prior) is updated 
to a new probability (the posterior). If one has no prior information, then one uses an ignorant 
prior I10| . 




(1) 



subject to the appropriate constraints. Poid(2:j ^) contains our prior information 
which we call the joint prior. To be explicit, 



Poidix,9) = Poidie)Pouix\e) , (2) 

where Poid(fi') is the traditional Bayesian prior and Poid{x\0) is the likelihood. 
It is important to note that they both contain prior information. The Bayesian 
prior is defined as containing prior information. However, the likelihood is not 
traditionally thought of in terms of prior information. Of course it is reasonable 
to see it as such because the likelihood represents the model (the relationship 
between 9 and x) that has already been established. Thus we consider both 
pieces, the Bayesian prior and the likelihood to be prior information. 

The new information is the observed data, x', which in the ME framework 
must be expressed in the form of a constraint on the allowed posteriors. The 
family of posteriors that reflects the fact that x is now known to be x' is such 
that 

Ci : P{x) ^ J d9 Pix, 9) = 6{x - x') . (3) 

This amounts to an infinite number of constraints: there is one constraint on 
P{x, 6) for each value of the variable x and each constraint will require its own 
Lagrange multiplier A(x). Furthermore, we impose the usual normalization con- 
straint, 

dxd9 P{x, 61) = 1 , (4) 

and include additional information about 9 in the form of a constraint on the 
expected value of some function /(6'jf|, 

C2 : j dxd9 P{x, 9)f{9) = {f{9)) = F . (5) 

We emphasize that constraints imposed at the level of the prior need not be 
satisfied by the posterior. What we do here differs from the standard Bayesian 
practice in that we require the constraint to be satisfied by the posterior distri- 
bution. 

Maximize ([1]) subject to the above constraints, 

S + a[J dxd9P{x,9) - l] 
+(3 [J dxd9P{x, 9)f{9) -F] ^ = , (6) 
/ dx\{x) [/ d9P{x, 9) - 5{x - x)] 



yields the joint posterior. 



\(x)+fsf(e) 

PncA^, 9) = Poidix, 9) , (7) 



^Including an additional constraint in the form of f dxd9P{x, 8)g{x) = (g) = G could only 
be used when it does not contradict the data constraint ||3j. Therefore, it is redundant and 
the constraint would simply get absorbed when solving for X{x). 



where Z is determined by using ([3]), 



and the Lagrange multiphers A(a;) are determined by using ([3]) 

The posterior now becomes 

P„cw(x, B) - Foid(2;, e)<5(x - 7^-57 , (10) 

C(a;,p) 

where C(a;,/3) = / (i6'e^^WPoid(a;, 

The Lagrange multipHer /3 is determined by first substituting the posterior 
into ©, 

e/3/(e) 

Poid(a;, 6')(5(a; - x)- 



dxdO 

Integrating over x yields 



ax,p) 

/d0e'^/(«)PoM(x',0)/(0) 



fiO) = F . (11) 



where C(a;,/9) ^ C(a;',/3) = ^' dOe'^^^^'^ Po\d{x' ,B). Now ^ can be determined by 

d\nC{x',P) 



- F ■ (13) 

to marginahze the pc 

updated probabihty, 



dp 

The final step is to marginalize the posterior, Pncwix,0) over x to get our 



e/3/(«) 

P„cw(0)=Pold(x',0)^^^ (14) 

Additionally, this result can be rewritten using the product rule as 

e/3/(e) 

P„cw(e) = Poidie)Pomix'\e)^^^-^ , (15) 

where ('(a;', P) = J deePf^'^^Po\&(e)Po\d{x'\6). The right side resembles Bayes the- 
orem, where the term Poid(a;'|6') is the standard Bayesian likelihood and Poid(^') 
is the prior. The exponential term is a modification to these two terms. Notice 
when (3 — (no moment constraint) we recover Bayes' rule. For /? 7^ Bayes' 
rule is modified by a "canonical" exponential factor. 

It must be noted that MaxEnt has been traditionally used for obtaining a 
prior for use in Bayesian statistics. When this is the case, the updating is se- 
quential. This is not the case here where both types of information are processed 
simultaneously. In the sequential updating case, the multiplier (3 is chosen so 
that the posterior Pncw only satisfies C2. In the simultaneous updating case the 
multiplier /? is chosen so that the posterior Pnow satisfies both Ci and C2 or 

C1AC2 M- 



3 Inference in Ecology 



In the following sections we will discuss the traditional way diversity is measured 
and the way it is measured using ME. This will be done by examining a simple 
example and comparing the two methods. In addition, we will show how the 
ME method could include information that the traditional method cannot. 

The general information for the example is as follows: There are k types of 
plants in a forest. A portion of the forest is examined and the amount of each 
species is counted where mi,m2--.mk represents the counts of each species 
and n represents the total count so that n = X^i^"^*- Additionally, we know 
from biological examination that one species, S2 and another species, S5 are 
codependent. Perhaps they need each others pollen in such supply that they 
cannot exist unless there are on the average, twice the number of S2 as compared 
to S5. 

3.1 Traditional Diversity 

We calculate the Shannon diversity by using Shannon's entropy as follows. 



where pi = mi/n. The problem with using this method is not in the method 
itself but with the reason it is being used. If the purpose of using this method 
was to measure the diversity of the portion that was counted then the method 
is acceptable. However, if the purpose of the method is to estimate or infer the 
diversity of the whole forest, then it is a poor estimate. First, pi is meant to 
represent the probability of finding the ith species in the forest. As previously 
stated, the frequency of the sample is not equivalent to the probability. In fact, 
it is the expected value of the frequency that is equivalent to the probability, 
{F) = _p [S] . It would only make sense to use the frequency as an estimate of the 
probability when n is very large (i.e. n 00) but this is not usually the case. 
Second, the diversity of two samples that have the same ratio of frequencies 
will be the same. Therefore this measure does not reflect the abundance of the 
species. This might be a desirable feature [3]. Third, there is no clear way to 
process the information about the codependence using Shannon's entropy. 



Here we intend to use a better method to estimate or infer pi and that method 
is the ME method. The first task is to realize that the correct mathematical 
model for the probability of getting a particular species where the information 
that we have is the number of species counted is a multinomial distribution. The 
probability of finding k species in n counts which yields instances for the 



k 




(16) 



3.2 ME Diversity 



species is 



n' 

Po\d{m\p, n) = Poid("ii . . . mfelpi . . .pk, n) = — — -p"^^ . . .p"" , (17) 

mi! . . . rrifc! 

where m = (wi, . . . , m/j) with X^iLi ™i = ^-^id -P = {PiT-'iPk) with 
X/i=iPi — 1- The general problem is to infer the parameters p on the basis 
of information about the data, m'. Here we see the first advantage with using 
the ME diversity; we allow for fluctuations in our inference by looking at a 
distribution of p's as opposed to claiming that we know the "true" p. 

Additionally we can include information about the codependence by using 
the following general constraint, 

k 



ifip)) = F where f{p) = V. ^ , (18) 



where fi is used to represent the codependence. For our example, on the average, 
we will find twice the number of S2 as compared to S5 thus, on the average, 
the probability of finding one of the species will be twice that of the other, 
(P2) = 2 (ps). In this case, /a = 1, /s = -2 and f^^(2,5) = F = 0. 

Next we need to write the data (counts) as a constraint which in general is 

P{m\n) = Smm' , (19) 
where m' = {m[, . . . , m'^}. Finally we write the appropriate entropy to use, 

P(m,p|n) 



S[P,Poid]=-J2 I dpP{m,p\n)log 



Poid{m,p\n) 



(20) 



where 



E= E S{^^^^m,-n), (21) 

m mi. ..771/5=0 

and 

J dp = Idp,...dpk6 - 1) ' (22) 

and where Po\d{'m',p\n) — Po\dip\'n)Poid{'m\p,n). The prior Poid(p) is not impor- 
tant for our current purpose so for the sake of definiteness we can choose it 
flat for our example (there are most likely better choices for priors). We then 
maximize this entropy with respect to P{m,p\n) subject to normalization and 
our constraints which after marginalizing over m' yields, 

e/3/(p) 

P{p)=Poid{m'\p,n)^^ , (23) 



where 



C = j dpe^^^^^Poia{m'\p,n) and F . (24) 



The probability distribution P{p) has sometimes been criticized for being too 
strange. The idea of getting a probabihty of a probabihty may seem strange at 
first but makes absolute sense. We do not know the "true" distribution of species, 
Pi. Therefore it seems natural to express our knowledge with some uncertainty 
in the form of a distribution. Notice that if one has no information relating the 
species then f3 — 0. 

Finally by substituting into ([^0]) . and using our constraints (fT5|) and 
(|19p we introduce our new general measure for diversity, 

Sme =logC ~ PF . (25) 

4 Conclusions 

Diversity is an important concept in many fields. In this paper we provided a toy 
example of how ME would be used as a measure of diversity that may simulate 
real world situations. By using the multinomial, we not only properly infer p 
so that fluctuations are represented, we get the additional bonus of having the 
abundance of the species represented in the measure. It is critical to note that 
our diversity, Sme satisfies all of Pielou's axioms [Tl . 

This of course could all be done with only using Bayes to infer p. However, 
by using the ME method we can include additional information allowing to go 
beyond what Bayes' rule and MaxEnt methods alone could do. Therefore, we 
would like to emphasize that anything one can do with Bayesian or MaxEnt 
methods, one can now do with ME. Additionally, in ME one now has the ability 
to apply additional information that Bayesian or MaxEnt methods could not. 
Further, any work done with Bayesian techniques can be implemented into the 
ME method directly through the joint prior. 

Although Shannon had discovered the entropy that bears his name quite 
independently of thermodynamic considerations, it nevertheless is directly pro- 
portional to the thermodynamic entropy. The realization that the ME diversity 
is of the exact same form as the thermodynamic entropjQ is of no small conse- 
quence. All of the concepts that thermodynamics utilizes can now also be utilized 
in ecology, whether it be energy considerations or equilibrium conditions, etc. 

To see a detailed method for calculating C, see [8], for a numeric example, 
see [12] and for an example of what do when one knows that there are species 
in the forest but simply have not been counted (perhaps they are rare), see jl3j . 
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Systems, Boston, 2007. 

^The thermodynamic entropy is actually, S = logf +I3F. The fact that our entropy II25I I 
has a —/3F is a reflection of our choice to add our Lagrange multipliers in ^ as opposed to 
subtracting them as is the case in thermodynamics. However, this is trivial because when 
one solves for /3 in JTSj the sign will be accounted for. Thus, if the Lagrange multiplier was 
subtracted, the solution to l|13|l would be —F and the entropy would have a +I3F. 
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